Estimating Cache Performance for Sequential and Data Parallel Programs

نویسنده

  • Thomas Fahringer
چکیده

This paper introduces an analytical model that enables automatic estimation of the cache performance for both sequential and data parallel Fortran programs. The estimation is based on a classiication of array accesses with respect to cache reuse at the source code level. An estimated upper bound of the number of distinct cache lines accessed inside of a loop is statically computed. Based on this estimate the number of cache misses for loops, procedures and the entire program can be predicted. The method has been implemented as part of P 3 T (Parameter based Performance Prediction Tool) and successfully supports VFCS (Vienna Fortran Compilation System) in guiding the application of data distributions and program transformations on distributed memory multiprocessor systems to achieve greater cache eeectiveness. Experiments are presented that demonstrate the eecacy of our approach with very encouraging experimental results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Preliminary Evaluation of Cache-miss-initiated Prefetching Techniques in Scalable Multiprocessors

Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can signiicantly reduce the number of cache misses, but may introduce bursty network and memory traac, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we exam...

متن کامل

Coherence Miss Classification for Performance Debugging in Multi-Core Processors

Multi-core processors offer large performance potential for parallel applications, but writing these applications is notoriously difficult. Tuning a parallel application to achieve scalability, referred to as performance debugging, is often more challenging for programmers than conventional debugging for correctness. Parallel programs have several performance related issues that are not seen in...

متن کامل

Hardware Support for Data Dependence Speculation in Distributed Shared-Memory Multiprocessors Via Cache-block Reconciliation

Data dependence speculation allows a compiler to relax the constraint of data-independence to issue tasks in parallel, increasing the potential for automatic extraction of parallelism from sequential programs. This paper proposes hardware mechanisms to support a data-dependence speculative distributed shared-memory (DDSM) architecture that enable speculative parallelization of programs with irr...

متن کامل

DCompose: A Tool for Measuring Data Decomposition on Distributed Memory Multiprocessors

In converting sequential programs for execution on distributed memory parallel processors, the programmer must determine the optimal data decomposition for the data structures. This task is an extremely complex optimisation problem and thus is usually performed manually. This chapter describes an X based visualisation tool called DCompose, which allows a programmer to measure the efficiency of ...

متن کامل

P3T: An Automatic Performance Estimator for Parallel Programs

The area of parallelizing compilers for distributed memory multicomputers has seen considerable research activity during the last few years. Most of the current compilers do not provide any support for estimating performance impacts of code changes that they apply. In this paper, we present P 3 T, which is a static and automatic performance estimator for data parallel programs. It computes at c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997